Optimizing OpenCL Kernels for Iterative Statistical Applications on GPUs

نویسندگان

  • Thilina Gunarathne
  • Bimalee Salpitikorala
  • Arun Chauhan
  • Geoffrey Fox
چکیده

We present a study of three important kernels that occur frequently in iterative statistical applications: K-Means, MultiDimensional Scaling (MDS), and PageRank. We implemented each kernel using OpenCL and evaluated their performance on an NVIDIA Tesla GPGPU card. By examining the underlying algorithms and empirically measuring the performance of various components of the kernel we explored the optimization of these kernels by four main techniques: (1) caching invariant data in GPU memory across iterations, (2) selectively placing data in different memory levels, (3) rearranging data in memory, and (4) dividing the work between the GPU and the CPU. The optimizations resulted in performance improvements of up to 5X, compared to näıve OpenCL implementations. We believe that these categories of optimizations are also applicable to other similar kernels. Finally, we draw several lessons that would be useful in not only implementing other similar kernels with OpenCL, but also in devising code generation strategies in compilers that target GPGPUs through OpenCL.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Methods for Optimizing OpenCL Applications on Heterogeneous Multicore Architectures

Heterogeneous multicore architectures with CPU and add-on GPUs or streaming processors are now widely used in computer systems. These GPUs provide substantially more computation capability and memory bandwidth compared to traditional multi-cores. Also, because they are highly programmable, they provide the computational performance needed for realistic graphics rendering. Applications with gene...

متن کامل

Iterative statistical kernels on contemporary GPUs

We present a study of three important kernels that occur frequently in iterative statistical applications: Multi-Dimensional Scaling (MDS), PageRank, and K-Means. We implemented each kernel using OpenCL and evaluated their performance on NVIDIA Tesla and NVIDIA Fermi GPGPU cards using dedicated hardware, and in the case of Fermi, also on the Amazon EC2 cloud-computing environment. By examining ...

متن کامل

OCLoptimizer: An Iterative Optimization Tool for OpenCL

Nowadays, computers include several computational devices with parallel capacities, such as multicore processors and Graphic Processing Units (GPUs). OpenCL enables the programming of all these kinds of devices. An OpenCL program consists of a host code which discovers the computational devices available in the host system and it queues up commands to the devices, and the kernel code which defi...

متن کامل

From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming

In this work, we evaluate OpenCL as a programming tool for developing performanceportable applications for GPGPU. While the Khronos group developed OpenCL with programming portability in mind, performance is not necessarily portable. OpenCL has required performance-impacting initializations that do not exist in other languages such as CUDA. Understanding these implications allows us to provide ...

متن کامل

Cooperative Kernels: GPU Multitasking for Blocking Algorithms (Extended Version)

There is growing interest in accelerating irregular data-parallel algorithms on GPUs. These algorithms are typically blocking, so they require fair scheduling. But GPU programming models (e.g. OpenCL) do not mandate fair scheduling, and GPU schedulers are unfair in practice. Current approaches avoid this issue by exploiting scheduling quirks of today’s GPUs in a manner that does not allow the G...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011